{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Analysis of Algorithms\n",
    "\n",
    "Code examples from [Think Complexity, 2nd edition](http://greenteapress.com/wp/complexity2), Appendix A\n",
    "\n",
    "Copyright 2017 Allen Downey, [MIT License](http://opensource.org/licenses/MIT)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from __future__ import print_function, division\n",
    "\n",
    "%matplotlib inline\n",
    "\n",
    "import os\n",
    "import string\n",
    "\n",
    "import numpy as np\n",
    "\n",
    "import thinkplot\n",
    "\n",
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Empirical order of growth\n",
    "\n",
    "Sometimes we can figure out what order of growth a function belongs to by running it with a range of problem sizes and measuring the run time.\n",
    "\n",
    "To measure runtimes, we'll use `etime`, which uses `os.times` to compute the total time used by a process, including \"user time\" and \"system time\".  User time is time spent running your code; system time is time spent running operating system code on your behalf."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def etime():\n",
    "    \"\"\"Measures user and system time this process has used.\n",
    "\n",
    "    Returns the sum of user and system time.\"\"\"\n",
    "    user, sys, chuser, chsys, real = os.times()\n",
    "    return user+sys"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`time_func` takes a function object and a problem size, `n`, runs the function, and returns the elapsed time."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def time_func(func, n):\n",
    "    \"\"\"Run a function and return the elapsed time.\n",
    "    \n",
    "    func: function\n",
    "    n: problem size\n",
    "    \n",
    "    returns: user+sys time in seconds\n",
    "    \"\"\"\n",
    "    start = etime()\n",
    "    func(n)\n",
    "    end = etime()\n",
    "    elapsed = end - start\n",
    "    return elapsed"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here's an example: a function that makes a list with the given length using `append`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def append_list(n):\n",
    "    t = []\n",
    "    for i in range(n):\n",
    "        t.append(i)\n",
    "    return t\n",
    "\n",
    "time_func(append_list, 1000000)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`run_timing_test` takes a function, runs it with a range of problem sizes, and returns two lists: problem sizes and times."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def run_timing_test(func, max_time=1):\n",
    "    \"\"\"Tests the given function with a range of values for n.\n",
    "    \n",
    "    func: function object\n",
    "\n",
    "    returns: list of ns and a list of run times.\n",
    "    \"\"\"\n",
    "    ns = []\n",
    "    ts = []\n",
    "    for i in range(10, 28):\n",
    "        n = 2**i\n",
    "        t = time_func(func, n)\n",
    "        print(n, t)\n",
    "        if t > 0:\n",
    "            ns.append(n)\n",
    "            ts.append(t)\n",
    "        if t > max_time:\n",
    "            break\n",
    "\n",
    "    return ns, ts"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here's an example with `append_list`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ns, ts = run_timing_test(append_list)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`fit` takes the lists of ns and ts and fits it with a curve of the form `a * n**exp`, where `exp` is a given exponent and `a` is chosen so that the line goes through a particular point in the sequence, usually the last. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def fit(ns, ts, exp=1.0, index=-1):\n",
    "    \"\"\"Fits a curve with the given exponent.\n",
    "    \n",
    "    ns: sequence of problem sizes\n",
    "    ts: sequence of times\n",
    "    exp: exponent of the fitted curve\n",
    "    index: index of the element the fitted line should go through\n",
    "    \n",
    "    returns: sequence of fitted times\n",
    "\n",
    "    \n",
    "    \"\"\"\n",
    "    # Use the element with the given index as a reference point, \n",
    "    # and scale all other points accordingly.\n",
    "    nref = ns[index]\n",
    "    tref = ts[index]\n",
    "\n",
    "    tfit = []\n",
    "    for n in ns:\n",
    "        ratio = n / nref\n",
    "        t = ratio**exp * tref\n",
    "        tfit.append(t)\n",
    "\n",
    "    return tfit"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`plot_timing_test` plots the results."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def plot_timing_test(ns, ts, label='', color='blue', exp=1.0, scale='log'):\n",
    "    \"\"\"Plots data and a fitted curve.\n",
    "\n",
    "    ns: sequence of n (problem size)\n",
    "    ts: sequence of t (run time)\n",
    "    label: string label for the data curve\n",
    "    color: string color for the data curve\n",
    "    exp: exponent (slope) for the fitted curve\n",
    "    \"\"\"\n",
    "    tfit = fit(ns, ts, exp)\n",
    "    plt.plot(ns, tfit, color='0.7', linewidth=2, linestyle='dashed')\n",
    "    plt.plot(ns, ts, 's-', label=label, color=color, alpha=0.5, linewidth=3)\n",
    "    plt.xlabel('Problem size (n)')\n",
    "    plt.ylabel('Runtime (seconds)')\n",
    "    plt.xscale(scale)\n",
    "    plt.yscale(scale)\n",
    "    plt.legend()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here are the results from `append_list`.  When we plot `ts` versus `ns` on a log-log scale, we should get a straight line.  If the order of growth is linear, the slope of the line should be 1. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plot_timing_test(ns, ts, 'append_list', exp=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For small values of `n`, the runtime is so short that we're probably not getting an accurate measurement of just the operation we're interested in.  But as `n` increases, runtime seems to converge to a line with slope 1.  \n",
    "\n",
    "That suggests that performing append `n` times is linear, which suggests that a single append is constant time.  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### list.pop\n",
    "\n",
    "Now let's try that with `pop`, and specifically with `pop(0)`, which pops from the left side of the list."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def pop_left_list(n):\n",
    "    t= []\n",
    "    for i in range(n):\n",
    "        t.append(i)\n",
    "    for _ in range(n):\n",
    "        t.pop(0)\n",
    "    return t\n",
    "\n",
    "ns, ts = run_timing_test(pop_left_list)\n",
    "plot_timing_test(ns, ts, 'pop_left_list', exp=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "That doesn't look very good.  The runtimes increase more steeply than the line with slope 1.  Let's try slope 2."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plot_timing_test(ns, ts, 'pop_left_list', exp=2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The last few points converge on the line with slope 2, which suggests that `pop(0)` is quadratic."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Exercise:**  What happens if you pop from the end of the list?  Write a function called `pop_right_list` that pops the last element instead of the first.  Use `run_timing_test` to characterize its performance.  Then use `plot_timing_test` with a few different values of `exp` to find the slope that best matches the data.  What conclusion can you draw about the order of growth for popping an element from the end of a list?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Solution goes here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Sorting\n",
    "\n",
    "We expect sorting to be `n log n`.  On a log-log scale, that doesn't look like a straight line, so there's no simple test whether it's really `n log n`.  Nevertheless, we can plot results for sorting lists with different lengths, and see what it looks like."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def sort_list(n):\n",
    "    t = np.random.random(n)\n",
    "    t.sort()\n",
    "    return t\n",
    "\n",
    "ns, ts = run_timing_test(sort_list)\n",
    "plot_timing_test(ns, ts, 'sort_list', exp=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It sure looks like sorting is linear, so that's surprising.  But remember that `log n` changes much more slowly than `n`.  Over a wide range of values, `n log n` can be hard to distinguish from an algorithm with linear growth.  As `n` gets bigger, we would expect this curve to be steeper than slope 1.  But often, for practical problem sizes, `n log n` is as good as linear."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Bisection search\n",
    "\n",
    "We expect bisection search to be `log n`, which is so fast it is hard to measure the way we've been doing it.\n",
    "\n",
    "To make it work, I create the sorted list ahead of time and use the parameter `hi` to specify which part of the list to search.  Also, I have to run each search 100,000 times so it takes long enough to measure. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "t = np.random.random(16777216)\n",
    "t.sort()\n",
    "\n",
    "from bisect import bisect\n",
    "\n",
    "def search_sorted_list(n):\n",
    "    for i in range(100000):\n",
    "        index = bisect(t, 0.1, hi=n) \n",
    "    return index\n",
    "\n",
    "ns, ts = run_timing_test(search_sorted_list, max_time=0.2)\n",
    "plot_timing_test(ns, ts, 'search_sorted_list', exp=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It looks like the runtime increases slowly as `n` increases, but it's definitely not linear.  To see if it's constant time, we can compare it to the line with slope 0."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plot_timing_test(ns, ts, 'search_sorted_list', exp=0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Nope, looks like it's not constant time, either.  We can't really conclude that it's `log n` based on this curve alone, but the results are certainly consistent with that theory."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Dictionary methods\n",
    "\n",
    "**Exercise:** Write methods called `add_dict` and `lookup_dict`, based on `append_list` and `pop_left_list`.  What is the order of growth for adding and looking up elements in a dictionary?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Solution goes here"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Solution goes here"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Solution goes here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Implementing a hash table\n",
    "\n",
    "The reason Python dictionaries can add and look up elements in constant time is that they are based on hash tables.  In this section, we'll implement a hash table in Python.  Remember that this example is for educational purposes only.  There is no practical reason to write a hash table like this in Python.\n",
    "\n",
    "We'll start with a simple linear map, which is a list of key-value pairs."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "class LinearMap(object):\n",
    "    \"\"\"A simple implementation of a map using a list of tuples\n",
    "    where each tuple is a key-value pair.\"\"\"\n",
    "\n",
    "    def __init__(self):\n",
    "        self.items = []\n",
    "\n",
    "    def add(self, k, v):\n",
    "        \"\"\"Adds a new item that maps from key (k) to value (v).\n",
    "        Assumes that they keys are unique.\"\"\"\n",
    "        self.items.append((k, v))\n",
    "\n",
    "    def get(self, k):\n",
    "        \"\"\"Looks up the key (k) and returns the corresponding value,\n",
    "        or raises KeyError if the key is not found.\"\"\"\n",
    "        for key, val in self.items:\n",
    "            if key == k:\n",
    "                return val\n",
    "        raise KeyError"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First let's make sure it works:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true,
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "def test_map(m):\n",
    "    s = string.ascii_lowercase\n",
    "\n",
    "    for k, v in enumerate(s):\n",
    "        m.add(k, v)\n",
    "\n",
    "    for k in range(len(s)):\n",
    "        print(k, m.get(k))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "m = LinearMap()\n",
    "test_map(m)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now let's see how long it takes to add `n` elements."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def add_linear_map(n):\n",
    "    d = LinearMap()\n",
    "    for i in range(n):\n",
    "        d.add(i, 1)\n",
    "    return d\n",
    "\n",
    "ns, ts = run_timing_test(add_linear_map)\n",
    "plot_timing_test(ns, ts, 'add_linear_map', exp=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Adding `n` elements is linear, so each add is constant time.  How about lookup?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def lookup_linear_map(n):\n",
    "    d = LinearMap()\n",
    "    for i in range(n):\n",
    "        d.add(i, 1)\n",
    "    total = 0\n",
    "    for i in range(n):\n",
    "        total += d.get(i)\n",
    "    return d\n",
    "\n",
    "ns, ts = run_timing_test(lookup_linear_map)\n",
    "plot_timing_test(ns, ts, 'lookup_linear_map', exp=2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Looking up `n` elements is $O(n^2)$ (notice that `exp=2`).  So each lookup is linear.\n",
    "\n",
    "Let's see what happens if we break the list of key-value pairs into 100 lists."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "class BetterMap(object):\n",
    "    \"\"\"A faster implementation of a map using a list of LinearMaps\n",
    "    and the built-in function hash() to determine which LinearMap\n",
    "    to put each key into.\"\"\"\n",
    "\n",
    "    def __init__(self, n=100):\n",
    "        \"\"\"Appends (n) LinearMaps onto (self).\"\"\"\n",
    "        self.maps = []\n",
    "        for i in range(n):\n",
    "            self.maps.append(LinearMap())\n",
    "\n",
    "    def find_map(self, k):\n",
    "        \"\"\"Finds the right LinearMap for key (k).\"\"\"\n",
    "        index = hash(k) % len(self.maps)\n",
    "        return self.maps[index]\n",
    "\n",
    "    def add(self, k, v):\n",
    "        \"\"\"Adds a new item to the appropriate LinearMap for key (k).\"\"\"\n",
    "        m = self.find_map(k)\n",
    "        m.add(k, v)\n",
    "\n",
    "    def get(self, k):\n",
    "        \"\"\"Finds the right LinearMap for key (k) and looks up (k) in it.\"\"\"\n",
    "        m = self.find_map(k)\n",
    "        return m.get(k)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "m = BetterMap()\n",
    "test_map(m)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The run time is better (we get to a larger value of `n` before we run out of time). "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def lookup_better_map(n):\n",
    "    d = BetterMap()\n",
    "    for i in range(n):\n",
    "        d.add(i, 1)\n",
    "    total = 0\n",
    "    for i in range(n):\n",
    "        total += d.get(i)\n",
    "    return d\n",
    "\n",
    "ns, ts = run_timing_test(lookup_better_map)\n",
    "plot_timing_test(ns, ts, 'lookup_better_map', exp=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The order of growth is hard to characterize.  It looks steeper than the line with slope 1.  Let's try slope 2."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plot_timing_test(ns, ts, 'lookup_better_map', exp=2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It might be converging to the line with slope 2, but it's hard to say anything conclusive without running larger problem sizes."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Exercise:** Go back and run `run_timing_test` with a larger value of `max_time` and see if the run time converges to the line with slope 2.  Just be careful not to make `max_time` to big."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we're ready for a complete implementation of a hash map."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "class HashMap(object):\n",
    "    \"\"\"An implementation of a hashtable using a BetterMap\n",
    "    that grows so that the number of items never exceeds the number\n",
    "    of LinearMaps.\n",
    "\n",
    "    The amortized cost of add should be O(1) provided that the\n",
    "    implementation of sum in resize is linear.\"\"\"\n",
    "\n",
    "    def __init__(self):\n",
    "        \"\"\"Starts with 2 LinearMaps and 0 items.\"\"\"\n",
    "        self.maps = BetterMap(2)\n",
    "        self.num = 0\n",
    "\n",
    "    def get(self, k):\n",
    "        \"\"\"Looks up the key (k) and returns the corresponding value,\n",
    "        or raises KeyError if the key is not found.\"\"\"\n",
    "        return self.maps.get(k)\n",
    "\n",
    "    def add(self, k, v):\n",
    "        \"\"\"Resize the map if necessary and adds the new item.\"\"\"\n",
    "        if self.num == len(self.maps.maps):\n",
    "            self.resize()\n",
    "\n",
    "        self.maps.add(k, v)\n",
    "        self.num += 1\n",
    "\n",
    "    def resize(self):\n",
    "        \"\"\"Makes a new map, twice as big, and rehashes the items.\"\"\"\n",
    "        new_map = BetterMap(self.num * 2)\n",
    "\n",
    "        for m in self.maps.maps:\n",
    "            for k, v in m.items:\n",
    "                new_map.add(k, v)\n",
    "\n",
    "        self.maps = new_map"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "m = HashMap()\n",
    "test_map(m)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Exercise:** Write a function called `lookup_hash_map`, based on `lookup_better_map`, and characterize its run time."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Solution goes here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If things have gone according to plan, the results should converge to a line with slope 1.  Which means that `n` lookups is linear, which means that each lookup is constant time.  Which is pretty much magic."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
